This is the final submission related to the phenomenom of the existence of the existence of Earth's lightning hotspots. As this is the last submission, it will mostly cover the algorithms used during the semester that was currectly giving back information. All plots are present, all code that used is here. Let's load in all the needed packages!
At this point the simplified ouput is created, meaning a faster workflow with it. Only steps that are need to be done are:
This was the point of the key paper, let's see how much close we can get to it!
Its source is the .h5 avaliable on kooplex. This is not needed to be recreated as I will hand this out. The problem is that the .h5 are a bit messy to handle, has insane amount of data and a dozen of parameters recorded in themselves. Thats why the simplified output was created during the semester: we only need a fraction of the data. Later on we will discover another issue, hidden from plain sight...
#let's load in the simplified data!
col_names = ["time","long","lat"]
simplified_data = pd.read_csv("../data/simplified/simplified.csv", skiprows=4, delimiter=" ", names=col_names)
print(simplified_data.info(), simplified_data.values.shape)
xedges_0 = np.arange(-180,181)
yedges_0 = np.arange(-54,55)
#these are arbetrarily choosen
xedges_1 = np.linspace(-180,180,int(12742/6/2)+1)
yedges_1 = np.linspace(-54,54,int(12742/6/2 * (54/180))+1)
#generate the hist2d
HIST2D, x_edges, y_edges = np.histogram2d(simplified_data["long"].values,simplified_data["lat"].values,
bins=(xedges_1,yedges_1))
print(HIST2D.shape)
HIST2D = HIST2D.T #you can transform it, as it is mostly used in this way for imshow, pcolormesh
<class 'pandas.core.frame.DataFrame'> RangeIndex: 798991 entries, 0 to 798990 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 time 798991 non-null float64 1 long 798991 non-null float64 2 lat 798991 non-null float64 dtypes: float64(3) memory usage: 18.3 MB None (798991, 3) (1061, 318)
plot_my_first_figure()
There will be a lot of masks, with the shapefiles for the well somewhat well defined surfaces. What is being used here is the following:
plot_my_second_figure()
We can compare this to the keypaper's results and it seems like it somewhat consistent. The keypaper states that the yearly analysis could be done in order to track the climate change. Also, if someone is good in topography, you could see that the hotspots locations are mostly under on climate on each continent. The most hotspots are located at:
In order to get the a deeper understanding and give an explanation, we have to compare these results to other datasets to draw conclusion about what may be behind the phenomenom. Personally I like the idea of wind pushing humid are to dry land that is meeting and obstructive surface (mountains). This means that the windspeed is generally slower in these areas. The problem is that most wind data is either a forecast, or should not be used for scientific models (or behind paywall). In this case I have choosen a dataset that is for observation only but should be good in this case!
The problem using wind measurements is that we instictively know that even during the day, the windspeed and direction changes, but the lightning hotspots are observed through a longer times interval. In order to choose the best day (at least) to describe, we need do to a seasonal or daily analysis. Let's get onto this!
plot_my_third_figure()
plot_my_fourth_figure()
${\bf \text{Whoops, something is not okay!}}$
The problem is that the data that is avaliable on kooplex is till 10/06. And NASA's datawarehouse for the LIS ISS data goes from version 1.0 to 2.0, the older data (so the missing part of 2020) is hardly accessable.
${\bf \text{But this data covers significant part of the year!}}$
Used wind data source are here:
https://data.remss.com/ccmp/v02.1.NRT/
https://images.remss.com/figures/measurements/ccmp/Mears_2019_CCMP_NRT_JGR.pdf
The problem with this is this uses the netCDF4 file format.
#preps
path_w = "E:/_ELTE_PHYS_MSC/3_third_semester/datascience/data/wind_data/my_days/"
file1 = "CCMP_RT_Wind_Analysis_20200120_V02.1_L3.0_RSS.nc"
file2 = "CCMP_RT_Wind_Analysis_20200316_V02.1_L3.0_RSS.nc"
file3 = "CCMP_RT_Wind_Analysis_20200613_V02.1_L3.0_RSS.nc"
file4 = "CCMP_RT_Wind_Analysis_20200718_V02.1_L3.0_RSS.nc"
file5 = "CCMP_RT_Wind_Analysis_20200728_V02.1_L3.0_RSS.nc"
file6 = "CCMP_RT_Wind_Analysis_20200916_V02.1_L3.0_RSS.nc"
#load in one
au_wind = netCDF4.Dataset(path_w + file1, "r", format="NETCDF4")
sa_wind = netCDF4.Dataset(path_w + file2, "r", format="NETCDF4")
eu_wind = netCDF4.Dataset(path_w + file3, "r", format="NETCDF4")
na_wind = netCDF4.Dataset(path_w + file4, "r", format="NETCDF4")
as_wind = netCDF4.Dataset(path_w + file5, "r", format="NETCDF4")
af_wind = netCDF4.Dataset(path_w + file6, "r", format="NETCDF4")
net_cdfs = []
net_cdfs.append(au_wind)
net_cdfs.append(sa_wind)
net_cdfs.append(eu_wind)
net_cdfs.append(na_wind)
net_cdfs.append(as_wind)
net_cdfs.append(af_wind)
plot_my_stream_figure()
<ipython-input-68-6dff51d65a9e>:46: UserWarning: Tight layout not applied. The left and right margins cannot be made large enough to accommodate all axes decorations. fig.tight_layout()
plot_my_sixth_figure()
plot_my_seventh_figure()
plot_my_eight_figure()
[1] ${\bf \text{Where Are the Lightning Hotspots on Earth?}}$, Rachel I. Albrecht1, Steven J. Goodman2, Dennis E. Buechler3, Richard J. Blakeslee4, and Hugh J. Christian. 01 Nov. 2016.
[2] ${\bf \text{Remote Sensing Systems Cross-Calibrated Multi-Platform (CCMP) 6-hourly ocean vector wind analysis}}$ product on 0.25 deg grid, Version 2.0, Wentz, F.J., J. Scott, R. Hoffman, M. Leidner, R. Atlas, J. Ardizzone, 2015: Remote Sensing Systems, Santa Rosa, CA.
Links:
LIS ISS data online: https://ghrc.nsstc.nasa.gov/lightning/data/data_lis_iss.html
WIND DATASET: https://data.remss.com/ccmp/v02.1.NRT/
Github repository for the work: https://github.com/AdamGTaylor/DataScience_Lab_2021
Great 3D map for Wind dataset: https://www.nnvl.noaa.gov/weatherview/index.html